task-agnostic online reinforcement learning
Review for NeurIPS paper: Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes
Clarity: The paper is overal clear and well written. I have a few suggestions to make it even easier to understand and/or fix some minor inconsistency. There is no need for the authors to answer to these points as I think the paper is already rather clear. I am unsure what Figure 1 represents. I might have missed it, but I think pi is not defined.
Review for NeurIPS paper: Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes
Reviewers agreed the paper contains interesting and sound contributions to an important problem, and is generally well written, although the model is fairly complex and the experimental domains are a bit simple. The authors are encouraged to provide further details to justify/explain certain algorithmic choices, include some of the key derivation steps (maybe with details in the appendix), and augment the experiments (like those in the rebuttal).
Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes
Continuously learning to solve unseen tasks with limited experience has been extensively pursued in meta-learning and continual learning, but with restricted assumptions such as accessible task distributions, independently and identically distributed tasks, and clear task delineations. However, real-world physical tasks frequently violate these assumptions, resulting in performance degradation. This paper proposes a continual online model-based reinforcement learning approach that does not require pre-training to solve task-agnostic problems with unknown task boundaries. We maintain a mixture of experts to handle nonstationarity, and represent each different type of dynamics with a Gaussian Process to efficiently leverage collected data and expressively model uncertainty. We propose a transition prior to account for the temporal dependencies in streaming data and update the mixture online via sequential variational inference.